{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "agmT3hrjsffX"
      },
      "source": [
        "# Gemini API: Embedding Quickstart with REST\n",
        "\n",
        "<table align=\"left\">\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/rest/Embeddings_REST.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
        "  </td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JMNKdTpTGZET"
      },
      "source": [
        "This notebook provides quick code examples that show you how to get started generating embeddings using `curl`.\n",
        "\n",
        "You can run this in Google Colab, or you can copy/paste the `curl` commands into your terminal.\n",
        "\n",
        "To run this notebook, your API key must be stored it in a Colab Secret named GOOGLE_API_KEY. If you are running in a different environment, you can store your key in an environment variable. See [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) to learn more."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "id": "R-Vw_mOM_WD0"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "from google.colab import userdata"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "id": "wCkLTpb3oTXE"
      },
      "outputs": [],
      "source": [
        "os.environ['GOOGLE_API_KEY'] = userdata.get('GOOGLE_API_KEY')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tjGqGBZ9yARd"
      },
      "source": [
        "## Embed content\n",
        "\n",
        "Call the `embed_content` method with the `text-embedding-004` model to generate text embeddings:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "id": "eA7I_Ww8IETn"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "{\n",
            "  \"embedding\": {\n",
            "    \"values\": [\n",
            "      0.013168523,\n",
            "      -0.008711934,\n",
            "      -0.046782676,\n",
            "      0.00069968984,\n",
            "      -0.009518873,\n",
            "      -0.008720178,\n",
            "      0.060103577,\n"
          ]
        }
      ],
      "source": [
        "%%bash\n",
        "\n",
        "curl https://generativelanguage.googleapis.com/v1beta/models/text-embedding-004:embedContent?key=$GOOGLE_API_KEY \\\n",
        "-H 'Content-Type: application/json' \\\n",
        "-d '{\"model\": \"models/text-embedding-004\",\n",
        "    \"content\": {\n",
        "    \"parts\":[{\n",
        "      \"text\": \"Hello world\"}]}, }' 2> /dev/null | head"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "x7ngWdZ7yDHp"
      },
      "source": [
        "# Batch embed content\n",
        "\n",
        "You can embed a list of multiple prompts with one API call for efficiency.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "id": "Z0b35xv5Ja_d"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "{\n",
            "  \"embeddings\": [\n",
            "    {\n",
            "      \"values\": [\n",
            "        -0.010632277,\n",
            "        0.019375855,\n",
            "        0.0209652,\n",
            "        0.0007706424,\n",
            "        -0.061464064,\n",
            "--\n",
            "        -0.0071538696,\n",
            "        -0.028534694\n",
            "      ]\n",
            "    },\n",
            "    {\n",
            "      \"values\": [\n",
            "        0.018467998,\n",
            "        0.0054281196,\n",
            "        -0.017658804,\n",
            "        0.013859266,\n",
            "        0.053418662,\n",
            "--\n",
            "        0.026714385,\n",
            "        0.0018762538\n",
            "      ]\n",
            "    },\n",
            "    {\n",
            "      \"values\": [\n",
            "        0.05808907,\n",
            "        0.020941721,\n",
            "        -0.108728774,\n",
            "        -0.04039259,\n",
            "        -0.04440443,\n"
          ]
        }
      ],
      "source": [
        "%%bash\n",
        "\n",
        "curl https://generativelanguage.googleapis.com/v1beta/models/text-embedding-004:batchEmbedContents?key=$GOOGLE_API_KEY \\\n",
        "-H 'Content-Type: application/json' \\\n",
        "-d '{\"requests\": [{\n",
        "      \"model\": \"models/text-embedding-004\",\n",
        "      \"content\": {\n",
        "      \"parts\":[{\n",
        "        \"text\": \"What is the meaning of life?\"}]}, },\n",
        "      {\n",
        "      \"model\": \"models/text-embedding-004\",\n",
        "      \"content\": {\n",
        "      \"parts\":[{\n",
        "        \"text\": \"How much wood would a woodchuck chuck?\"}]}, },\n",
        "      {\n",
        "      \"model\": \"models/text-embedding-004\",\n",
        "      \"content\": {\n",
        "      \"parts\":[{\n",
        "        \"text\": \"How does the brain work?\"}]}, }, ]}' 2> /dev/null | grep -C 5 values"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "nPBk2k4xuql8"
      },
      "source": [
        "## Set the output dimensionality\n",
        "If you're using `text-embeddings-004`, you can set the `output_dimensionality` paramater to create smaller embeddings.\n",
        "\n",
        "* `output_dimensionality` truncates the embedding (e.g., `[1, 3, 5]` becomes `[1,3]` when `output_dimensionality=2`).\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "id": "ny3bOQK1ut2_"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "{\n",
            "  \"embedding\": {\n",
            "    \"values\": [\n",
            "      0.013168523,\n",
            "      -0.008711934,\n",
            "      -0.046782676,\n",
            "      0.00069968984,\n",
            "      -0.009518873,\n",
            "      -0.008720178,\n",
            "      0.060103577,\n"
          ]
        }
      ],
      "source": [
        "%%bash\n",
        "\n",
        "curl https://generativelanguage.googleapis.com/v1beta/models/text-embedding-004:embedContent?key=$GOOGLE_API_KEY \\\n",
        "-H 'Content-Type: application/json' \\\n",
        "-d '{\"model\": \"models/text-embedding-004\",\n",
        "    \"output_dimensionality\":256,\n",
        "    \"content\": {\n",
        "    \"parts\":[{\n",
        "      \"text\": \"Hello world\"}]}, }' 2> /dev/null | head"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ObAdUvlk9x05"
      },
      "source": [
        "## Use `task_type` to provide a hint to the model how you'll use the embeddings\n",
        "\n",
        "Let's look at all the parameters the embed_content method takes. There are four:\n",
        "\n",
        "* `model`: Required. Must be `models/embedding-001`.\n",
        "* `content`: Required. The content that you would like to embed.\n",
        "* `task_type`: Optional. The task type for which the embeddings will be used. See below for possible values.\n",
        "* `title`: The given text is a document from a corpus being searched. Optionally, set the `title` parameter with the title of the document. Can only be set when `task_type` is `RETRIEVAL_DOCUMENT`.\n",
        "\n",
        "`task_type` is an optional parameter that provides a hint to the API about how you intend to use the embeddings in your application.\n",
        "\n",
        "The following task_type parameters are accepted:\n",
        "\n",
        "* `TASK_TYPE_UNSPECIFIED`: If you do not set the value, it will default to retrieval_query.\n",
        "* `RETRIEVAL_QUERY` : The given text is a query in a search/retrieval setting.\n",
        "* `RETRIEVAL_DOCUMENT`: The given text is a document from athe corpus being searched.\n",
        "* `SEMANTIC_SIMILARITY`: The given text will be used for Semantic Textual Similarity (STS).\n",
        "* `CLASSIFICATION`: The given text will be classified.\n",
        "* `CLUSTERING`: The embeddings will be used for clustering.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "id": "NwzsJmRrAo-t"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "{\n",
            "  \"embedding\": {\n",
            "    \"values\": [\n",
            "      0.060187872,\n",
            "      -0.031515103,\n",
            "      -0.03244149,\n",
            "      -0.019341845,\n",
            "      0.057285223,\n",
            "      0.037159503,\n",
            "      0.035636507,\n"
          ]
        }
      ],
      "source": [
        "%%bash\n",
        "\n",
        "curl https://generativelanguage.googleapis.com/v1beta/models/embedding-001:embedContent?key=$GOOGLE_API_KEY \\\n",
        "-H 'Content-Type: application/json' \\\n",
        "-d '{\"model\": \"models/text-embedding-004\",\n",
        "    \"content\": {\n",
        "    \"parts\":[{\n",
        "      \"text\": \"Hello world\"}]},\n",
        "    \"task_type\": \"RETRIEVAL_DOCUMENT\",\n",
        "    \"title\": \"My title\"}' 2> /dev/null | head"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jXkRYBhbB_b2"
      },
      "source": [
        "## Learning more\n",
        "\n",
        "* Learn more about text-embeddings-004 [here](https://developers.googleblog.com/2024/04/gemini-15-pro-in-public-preview-with-new-features.html).\n",
        "*   See the [REST API reference](https://ai.google.dev/api/rest) to learn more.\n",
        "*   Explore more examples in the cookbook.\n"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "name": "Embeddings_REST.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
